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School District Evaluation: Database Warehouse Support 1 

July 1996 

Eugene P. Adcock, Ph.D. 

Reginald Haseltine 

Research, Evaluation and Accountability 
Prince George’s County Public Schools 

Educational decision-making relies upon evaluation as the only way to make rational 
choices between alternative practices, to validate educational improvements, and to build a stable 
foundation of effective practices as a safeguard against faddish but ineffective innovations. 
Additionally, policies of private and government agencies currently make approval of systemic 
reform programs, new educational initiatives, and research grants contingent upon evidence of 
good planning and sound evaluation procedures. Consequently, local school districts are looking 
for ways to upgrade their evaluation support systems in the face of increasingly complex data 
environments and more stringent demands for higher quality evaluation reporting. 

The overall problem is that quality decision making requires uniform, timely, and 
accurate educational information. The practical problem is to provide successful assimilation of 
many years of accumulated, complex and ambiguous data from a wide variety of sources, 
optimize the data into reliable and meaningful data elements, and structure the data for optimal 
control, management, query and extraction. The solution is to upgrade school district evaluation 
offices with relational database capabilities, to establish a database warehouse support system for 
evaluation and research, and to support this system with access to all pertinent data sources 
within the district. The need to have a ready pool of reliable and valid data to support school 
district multiple evaluation needs can be solved through the institutionalization of the same new 
data warehouse technology currently being developed and applied to a variety of successful 
commercial enterprises. 



Paper presented at the Summer Data Conference, National Center for Education 
Statistics, The U.S. Department of Education, Office of Educational Research and 
Improvement, Washington D.C., July 24-26, 1 996. 
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From the perspective of staff responsible for fulfilling the evaluation needs of a large 
public school system, this paper presents an inside look at our current evaluation experience and 
how a database warehouse system has been developed as an indispensable evaluation data 
support tool for fulfilling the information demands of contemporary public school systems. 
Particular attention is given to an often overlooked, but critical evaluation issue: What are the 
requirements for data used in evaluation, and how can data be prepared to meet these 
requirements? Also, an overview of the design and operational characteristics of the Research 
and Evaluation Assimilation Database (READ) warehouse support system for evaluation 
activities in the Prince George’s County Public Schools (PGCPS) system is presented. 



The School District of the 1990's 

The 1 990's have been characterized by rapidly changing school environments, reform 
programs and systemic initiatives. In such an environment, public school evaluation offices can 
no longer wait until questions arrive to begin gathering pertinent evaluation data. The increasing 
public demand to hold schools accountable for their impact on student outcomes lends urgency to 
the task of establishing an evaluation response system with a pool of available data for statistical 
information processing. Also, the quality of reported results characterized by the historical 
practice of providing simple data aggregation and profile information often falls short of the 
objectives and unambiguous results now required by decision makers. Instead, the quality of 
evaluation results desired and expected comes from a process which arranges data on the basis of 
scientific design methodology and analyzes data using appropriate statistical procedures. 

The authors of this paper contend that the modem evaluation office needs to add the 
evaluation data support capabilities of a relational database warehouse system to its office 
infrastructure. We have applied modem data warehouse system technology that has been 



developed by computer scientists William Inmon 2 and Richard Hackathom 3 to capture, manage, 
and use the rich supply of years of accumulated school district data for research and evaluation 
purposes. In the four stage READ data warehouse pipeline, we have successfully adapted this 
business oriented technology to the public school environment. The evaluation data support 
provided by the READ warehousing system provides an indispensable, ready supply of accurate 
and reliable data that improves the overall efficiency and effectiveness of the evaluation office in 
a very demanding public school environment. 

Figure 1 presents the database warehouse evaluatidn support structure that has been 
developed in the Research, Evaluation and Accountability (REA) office of the Prince George’s 
County Public Schools (PGCPS). Within this evaluation structure, the READ warehousing 
support serves as the legacy 4 data capturing agent, data scrubbing and enhancement agent, and 
the evaluation data delivery agent to the statistical analysis stage of the evaluation office. These 
READ services have developed into such an indispensable evaluation support tool that it has 
effectively reshaped the entire infrastructure and operational characteristics of the evaluation 
office. 



Inmon, William H., Building the Data Warehouse. Wiley-QED, NY, 1992. 

Inmon, W. H., & Hackathom, R, Using the Data Warehouse. Wiley-QED, NY, 
1994. 

A legacy system is an established online transaction processing system that serves 
a specific management purpose within an enterprise. Typical school system 
legacy systems include payroll, personnel, instructional data systems (processes 
student course schedules and report cards), and pupil accounting and school 
boundaries (processes student-school enrollment status and attendance). 
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Figure 1: READ-Based Evaluation Office Model 



READ Evaluation Data Requirements 

In 1981, a Joint Committee issued one of the most significant documents to date in the 
field of educational evaluation entitled Standards for Evaluation of Educational Programs* 
Projects, gad Materials 5 It consisted of a set of 30 standards to be used both to guide the 



Standards for Evaluations of Educational Programs, Projects, and Materials. 
Developed by the Joint Committee on Standards for Educational Evaluation, 
McGraw-Hill, 1981. 
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conduct of evaluation of educational programs, projects, and practices and also to judge the 
soundness of such evaluations. The 30 standards are grouped according to four attributes of an 
evaluation - its utility, its feasibility, its propriety, and its accuracy. The evaluation accuracy 
standards, in particular, served as our guide in the development of the READ data gathering and 
quality control activities. 

The requirement standards for evaluation data require that the information obtained be 
technically adequate and linked logically to the evaluation objectives. Technical specifications 
for adequate evaluation data usually exceed those used by the legacy data sources from which 
REA collects raw data. Quality assurance is given such importance that the READ data 
warehousing pipeline has dedicated substantial resources to data verification, documentation, 
scrubbing and enhancement activities. 

According to the Standards, evaluation data must clearly identify and characterize the 
program or practice under examination (independent, dependent, and treatment data), and 
embody the contextual characteristics of the program under examination (e.g., size, scope, and 
time). The proactive data collection features of data warehousing, however, require that enough 
of the right kind of data be collected prior to any evaluation study proposal. Thus, a data 
collection scheme, based upon the most commonly required educational evaluation contextual 
and educational practice variables had to be devised for the READ data warehousing system. 

This READ data collection scheme focuses on the following five core database entities: 
student, teacher, school, program and instructional finance. These core database entities and the 
total database structure is presented in detail in the READ Technical Manual . 

Metadata, or “data about data” is a built-in component of the READ warehousing system. 
The metadata component fulfills the following quality assurance provision of the Standard: “The 
sources of information should be described in enough detail so that the adequacy of the 
information can be assessed.” Metadata is a particularly useful quality control component when 
it comes to using previous year data to perform post-hoc evaluation studies or trend analyses. 
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Storing information about the legacy sources of data as READ metadata provides another 
important management tool for this evaluation support system. 

Experience has taught the READ staff that there is often a considerable gap in the quality 
requirements for evaluation data and the condition of raw data received from legacy sources. 
Still, the Standards ’ requirement for systematic data control state that: “The data collected, 
processed, and reported in an evaluation should be reviewed and corrected, so that the results of 
the evaluation will not be flawed.” {Standards (1981), D7: Systematic Data Control, emphasis 
added). Thus, the READ system has devised sophisticated data scrubbing procedures which 
examine both the physical and statistical characteristics of data in order to ensure that evaluation 
data quality standards are being met. The next section presents an overview of the four stage 
READ system with a particular emphasis on preparing school system data for evaluation use. 



READ Function and Flow 

On the local school district level, the purpose of the Research and Evaluation 
Assimilation Database (READ) is to fulfill the input data requirements for the evaluation design 
and statistical analysis operations of the Research, Evaluation and Accountability (REA) office 
of the Prince George’s County Public Schools (PGCPS) system. The REA mission is to provide 
fair and scientifically valid approaches to the evaluation of school and program effects. REA 
fulfills the school district’s evaluation needs through the development of the READ warehousing 
system which is a proactive “end-to-end” evaluation system with four major data processing 
phases: 1) collect and confirm; 2) “scrub” and enhance; 3) structure and store; and 4) analyze and 
report. Figure 2 shows the four stage evaluation support system which characterizes READ. 




Analysis and Reporting 




Structure and Storage 




Scrubbing and Enhancement 




Collection and Confirmation 




Figure 2: READ Four Stage Pipeline Processing System 



One way to visualize how school system data flows through the READ data warehouse 
pipeline is to see how data is “pulled” from station to station in response to detailed control 
specifications. That is, the evaluation process “works in reverse,” from an evaluation question 
backwards to a pool of proactively captured and prepared data. The system design “pull” of 
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READ-based evaluation data begins with the decision support needs stipulated at stage four 
(“Analysis and Reporting”) which, in turn extracts data from stage three (“Structure and 
Storage”) which has prepared" data received from stage two (“Scrubbing and Enhancement”), that 
was initially acquired from the various legacy data stores of the school district in the first stage 
(“Collection and Confirmation”). Thus, the function of the READ system is established by the 
decision support evaluation concerns of the school district and NOT the Management 
Information System (MIS) department. 

Preparing School System Data for Evaluation Use 

Quality assurance activities occur at each data transition point along the four stages of the 
READ data pipeline shown in Figure 2. The control specifications detail the file, record and data 
element parameters of the receiving station. The READ Technical Manual provides examples of 
different types of control specification request forms used by the REA staff at each READ 
pipeline station (e.g., “Data Request Form,” “Data Transfer Form,” and “Record Specification 
Form”). The type and specificity of the form used to control the data transfer depends upon the 
point in the pipeline where data is being transferred. The “Record Specification Form” used to 
build a Sufficient Statistics Matrix (SSM) 6 file for hierarchical linear modeling school effects 
evaluation, for example, is very specific and exacting because it requires “drilling across” several 
READ tables. Extracting an SSM file requires expert relational database skills and specific 
knowledge of the data warehouse structure and content. Also, an SSM file extraction undergoes 
the highest level of quality assurance processing before any analysis at READ station four 
(“Analysis & Reporting”). 

0 

Figure 3 graphically shows the multi-step procedure used to build an SSM file from 
value-added, student-centric data warehouse data. The researcher makes an SSM file request 
based upon the evaluation needs of scientific design methodology and the statistical analysis 



Sufficient Statistics Matrix is an efficiently assimilated input data file for 
statistical analysis. An SSM is constructed to meet the scientific design 
requirements to address the evaluation question and possess the quality 
characteristics to yield reliable results. The term SSM is borrowed, and adapted, 
from Bryk, Raudenbush, and Congdon (SSI) 1996. 
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quality conditions necessary to empirically address a decision support situation. The ability to 
readily extract an SSM file from the READ warehouse represents the most important function of 
the READ database support system. 



Processing to Build Evaluation Data for a 
Sufficient Statistical Matrix (SSM) File 




Figure 3: READ Model For Building Evaluation Data 



Figure 3 shows the READ system data flow from pipeline stage two (“Scrubbed Data”), 
to stage three (“Data Warehouse”), and on to stage four (“Statistical Analysis”) with particular 
emphasis on the structuring procedures used to build an SSM file for evaluation analysis. New 
scrubbed and formatted data arrives at the data warehouse as an “inflow” process. Storage in the 
data warehouse requires reformatting and partitioning of data into subject oriented database 
tables, such as course tables, enrollment history, and student characteristics. In the example 
provided in Figure 3, evaluation data records are constructed for analysis of the relationship 
between student demographic characteristics, student enrollment history and student mathematics 
course achievement. That is, an SSM file is constructed from evaluation records which 
assimilate scrubbed, structured, and stored data by “drilling across” the following summarized 
and partitioned subject tables: pupil accounting student demographic table (PAGRAPHIC), 
student “vested” enrollment history table (VESTyy), and student mathematics course 
matriculation history table (MAXMATHyy). This simple example can be expanded as necessary 
in response to particular needs for any decision support evaluation. Again, it is this capability to 
produce proactively prepared evaluation data on demand that is the most important contribution 
of database warehouse technology to evaluation decision support operations. 

READ warehouse pipeline procedures require data scrubbing for all incoming data. 
Scrubbing data to evaluation requirement specifications often involves enhancement or “value 
added” processing and summarization of newly acquired legacy data, and the analysis and 
reporting responsibilities of the REA office require the highest degree of quality assurance and 
quality control of data extracted from READ. Data “scrubbing” is the second data management 
stage in the READ pipeline. During this scrubbing stage data undergoes initial standardization, 
normalization and enhancement (or “value adding”) in preparation for uploading to READ 
databases. Data scrubbing includes standardizing the naming syntax, formats, and values 
associated with incoming data elements. 

Experience has taught READ staff that because of the interconnected structure of a data 
warehouse environment, mistakes in naming syntax cost dearly in staff time to root out and 
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correct. Other data enhancement activities include recoding data element values or rearranging 
the existing data values into a well-defined structure, computing new data elements, or adding 
categorical “flags” to an incoming data element. Data enhancement is a vital, proactive data 
preparation phase for the READ system operations and greatly facilitates the REA office 
capabilities to provide quick responses to evaluation questions. The scrubbed and enhanced data 
is pulled or uploaded into pre-designed READ database tables at station three (“Structure and 
Storage”). The control specifications for pulling the data into these READ tables come from the 
tables’ structural design (e.g., data element specifications and characteristics). Data is partitioned 
into subject specific, normalized tables based on category (e.g., courses, tests, demographics, 
enrollment, etc.) and then summarized to meet evaluation data requirements for future statistical 
analysis purposes. As a final step, elements in the READ data tables are defined and 
documented (i.e., metadata) for later retrieval and analysis use. Thus, quality assurance occurs at 
all phases of the READ data pipeline process: data collection and confirmation, data scrubbing 
and enhancement, data storage and structure, and data analysis and reporting. 

READ Form and Design 

Evaluation concerns also drive the form of the database entities which make up the 
READ warehouse support system. The READ system data warehouse is designed and 
maintained to accommodate the evaluation data requirements of decision makers’ questions that 
will be asked some time in the future. This proactive engineering requires the expert perspective 
of experienced evaluation staff familiar with the evaluation demands of public school systems. 

A particularly critical engineering ingredient is the operational definitions for the database 
entities, subentities, elements, and values which database staff use to build and manage the 
READ system. The form, or logical view of the READ system, as currently configured, is 
pictured in the student-centric entity “wheel” displayed in Figure 4. 

While the actual database infrastructure is substantively more complex, the “wheel’ 
relationship depicts the links between tables and data elements within READ. The relationship 
wheel presents the core evaluation entities used to link the most important input, practices, 
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Figure 4: READ Student-Centric Entity Relationship: Logical View 

programs, and outcomes of the school district. The circular or “wheel” links and the spoke or 
“star” links displayed in Figure 4 show the established (and desired) database relationships 
between the READ entities. Currently, the data entities on the “wheel” are linked either directly 
at the student level (e.g., schools are linked to individual students) or indirectly to the student 
level (i.e., between the entities themselves, for example, core teachers at the elementary school 
level are linked to schools at the elementary school level and courses at the secondary level). 

The goal of the database staff is to fully develop the direct links between individual student 
records and those of the other core entities (i.e., the wheel’s “spokes”). For example, staff are 
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currently working on one data project that will establish a direct link between all students and 
their core subject teachers (i.e., mathematics, science, social studies and language), and another 
data project to link all students to a financial index table representing the cost associated with the 
delivery of core subject instruction. In terms of evaluation support, directly linking all entity 
data at the student level will provide the most robust evaluation design possibilities as input, 
practice, and outcome factors can be arranged at the student level of analysis. 



A simplified Entity-Relationship Diagram (ERD) of a portion of the READ data system 
is presented in Figure 5. An Entity-Relationship Diagram is the traditional development tool 




Figure 5: READ Entity-Relationship Diagram 




used to facilitate the logical design phase of a database system. An ERD provides a clear 
representation of all the data elements (i.e., “entities”) and their linking relationships. Entities 
then evolve into relational database tables as part of the physical design phase. As Figure 5 
shows, not all core entities are directly linked to students in the current design (e.g., TEACHER 
is linked to COURSE but not directly to STUDENT). The database design goal is to complete 
the student-centric form of READ by building the direct relationship links between the 
STUDENT entity and the other core entities. 



The READ Technical Manual 

A detailed description of the READ entity, element and value operational definitions is 
beyond the scope of this paper. This information is contained in the READ Technical Manual 
fRTMT The first edition of the RTM has been completed in July 1996. The RTM is a 
comprehensive description of the evaluation support database, infrastructure, and procedures 
used by the Research, Evaluation and Accountability (REA) staff in the Prince George’s County 
Public Schools (PGCPS) district to perform evaluation studies. This manual captures every 
aspect of the READ System since its inception three years ago. The justification for developing 
a data warehouse for public school program evaluation is discussed. The importance of proactive 
data collection from district legacy data sources is covered in detail, followed by a description of 
the procedures used to integrate historical data in different formats using relational database 
technology. The importance and magnitude of the often-overlooked data preparation steps (i.e. 
data “scrubbing”) are emphasized. READ concepts developed, including vestedness, 
belongingness, hierarchial cataloging of secondary courses, and maximum core-area student 
course determination are covered. The precise definitions of entities (e.g. “core teacher”), as 
used in the READ system for evaluation decisions are defined. This manual presents the details 
of the READ system from many perspectives, including the statistical perspective of the 
educational researcher, the database designer, a procedures manual for READ team members, 
and an introductory description of the READ system. Tutorial sections on relational databases 
and data warehousing are included. The proliferation of commercial data warehousing and a 
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proposed future READ system are presented. The READ Technical Manual is currently about 
500 pages and includes nine chapters, nine appendices and a bibliography. (See Appendix for 
the READ Technical Manual Table of Contents.) 



Conclusion 



Evaluation is the process of providing information for decision making. Educational evaluation 
applies scientific procedures to collect and structure reliable and valid data which is statistically 
summarized to yield quantitative results to make decisions about educational programs of 
interest. A Structured Query Language (SQL) relational database warehousing system such as 
READ does not alter the established educational evaluation components. Rather, it provides an 
indispensable data pipeline from which evaluation design and statistical processing phase draws 
and arranges data for analyses. 

The sought after objectivity in evaluation information support for decision making depends upon 
the scientific design and statistical procedures used to control and manage the data associated 
with the program being investigated. Traditional methods of reporting data from a data query or 
aggregation process unencumbered by scientific design and statistical control procedures yields 
results which are open to a variety of interpretations, alternative hypotheses, and unknown 
influences. While good intuition is a wonderful quality, prudent decision makers rely on 
evaluation results whenever available or obtainable. The READ system’s inherent wealth of 
legacy data, data quality control procedures and data management functions greatly facilitate the 
school district’s capabilities to yield objective decision support evaluation results. 
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